compositional_distributional_semanticsfandomcom-20200214-history
Basic types of distributional models
Research in distributional semantics has grown radically in recent years. Attempted models increase substantially in number and variety. They are different in many ways: what portion of data is chosen to build model, how it is transformed into vectors, what further modification is applied on them and how they are used in practice. Some researchers make a distinction between "distributional" and "distributed" models that the former term refers to models built on context counts while the later mainly refers to neural models. However, both of types derive their representation from the context of words as recorded in corpora hence, they all rely on distributional hypothesis. In addition, both type is distributed in the sense that they represent words by real-valued vectors. Hence, they are both distributional and distributed. Context Either a model compute statistics directly or employs a machine learning algorithm, it is the kind of context that decides what information the model can capture. Syntagmatic vs. paradigmatic SahlgrenSahlgren, M. (2006). The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Stockholm University. has made a useful distinction between [[Syntagmatic and paradigmatic relation|''syntagmatic'' and paradigmatic]] content of a model: A word-space model accumulated from co-occurrence information contains syntagmatic relations between words, while a word-space model accumulated from information about shared neighbors contains paradigmatic relations between words. Syntagmatic and paradigmatic information is inherently different and, although not completely orthogonal, carry very little overlap. Experimental results in chapter 9 of Sahlgren's dissertation support this argument. In statistical models, those in which a word is represented as a vector: \overrightarrow{v} = (c_1, c_1,..., c_n) where c_i is the number of occurrences of the word in the i''th context (e.g. a sentence, a paragraph, a document) capture syntagmatic information. Other models in which a word is represented as: \overrightarrow{v} = (w_1, w_1,..., w_n) where w_i is the number of co-occurrences of the word with the ''i''th word capture paradigmatic information. Neural models such as Bengio, Yoshua, et al. "A neural probabilistic language model." Innovations in Machine Learning. Springer Berlin Heidelberg, 2006. 137-186.Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. 2010.Mikolov, Tomas, et al. "Extensions of recurrent neural network language model." Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. IEEE, 2011. often derive its vectors from the summation of word neighborhood therefore take a paradigmatic view of word meaning. Part-of-speech According to Turney (2012)Turney, P. D. (2012). Domain and Function : A Dual-Space Model of Semantic Relations and Compositions. Journal of artificial intelligence research, 44, 533–585.: The intuition behind domain space is that the domain or topic of a word is characterized by the nouns that occur near it. and The concept of function space is that the function or role of a word is characterized by the syntactic context that relates it to the verbs that occur near it. in which "domain" space seems to capture "association" while "function" space "similarity". However this distinction is rarely used. Linear vs. dependency-based contexts Padó & Lapata (2007)Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161-199. observed that dependency-base contexts outperformed linear contexts in taks of detecting synonymy relations and to acquiring prevalent senses for polysemous words. Kiela & Clark (2014)Kiela, D., & Clark, S. (2014, April). A systematic study of semantic vector space model parameters. In Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC) at EACL (pp. 21-30). compared these two types of context (which they called window and dependency contexts) and got mixed results. Levy & Goldberg (2014)Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 302-308). showed that dependency-based contexts are better at distinguishing between similarity and association, and between "domain" and "function" similarity (though the datasets they used were flawed). http://clic.cimec.unitn.it/marco/publications/acl2015/pham-et-al-cphrase-acl2015.pdf Window sizes Kiela & Clark (2014) agreed with previous works that smaller windows are generally better. Algorithm : ''See List of distributional semantics algorithms Other distinctions Turney & Pantel (2010)Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37, 141–188. Retrieved from http://www.aaai.org/Papers/JAIR/Vol37/JAIR-3705.pdf divided vector space model into 3 types: similarity of documents, similarity of words, similarity of relation. This categorization is out of scope because this wiki is concerned only with word/phrase/sentence meaning. References